Search | WHO COVID-19 Research Database

Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via Text

Tandon, P.; Chandak, S.; Pataranutaporn, P.; Liu, Y.; Mapuranga, A. M.; Maes, P.; Weissman, T.; Sra, M..

IEEE Journal on Selected Areas in Communications ; 41(1):107-118, 2023.

Article in English | Scopus | ID: covidwho-2245641

ABSTRACT

Video represents the majority of internet traffic today, driving a continual race between the generation of higher quality content, transmission of larger file sizes, and the development of network infrastructure. In addition, the recent COVID-19 pandemic fueled a surge in the use of video conferencing tools. Since videos take up considerable bandwidth ( ∼ 100 Kbps to a few Mbps), improved video compression can have a substantial impact on network performance for live and pre-recorded content, providing broader access to multimedia content worldwide. We present a novel video compression pipeline, called Txt2Vid, which dramatically reduces data transmission rates by compressing webcam videos ('talking-head videos') to a text transcript. The text is transmitted and decoded into a realistic reconstruction of the original video using recent advances in deep learning based voice cloning and lip syncing models. Our generative pipeline achieves two to three orders of magnitude reduction in the bitrate as compared to the standard audio-video codecs (encoders-decoders), while maintaining equivalent Quality-of-Experience based on a subjective evaluation by users ( n=242 ) in an online study. The Txt2Vid framework opens up the potential for creating novel applications such as enabling audio-video communication during poor internet connectivity, or in remote terrains with limited bandwidth. The code for this work is available at https://github.com/tpulkit/txt2vid.git. © 1983-2012 IEEE.

Semantic Communication for Capacity-aware Remote Collaboration

Amano, T.; Kala, S. M.; Mizumoto, T.; Yamaguchi, H..

18th International Conference on Wireless and Mobile Computing, Networking and Communications, WiMob 2022 ; 2022-October:381-386, 2022.

Article in English | Scopus | ID: covidwho-2152557

ABSTRACT

The global spread of coronavirus has sparked a considerable interest in technologies that facilitate seamless communication between users which are physically or spatially distant. Using current remote collaboration systems that utilize 3D sensing with LiDAR and depth cameras, point cloud streaming, and MR/VR devices, distant users can communicate with each other as if they did in person. However, these systems may violate users' privacy since they can share information of their entire personal space with other users. In addition, although various point cloud compression methods have been proposed, remote transmission of 3D scenes still requires significant bandwidth. This paper proposes a 3D spatial data sharing system based on the paradigm of 'semantic communication', i.e., controlling communication in the units of semantic objects. Our system understands the semantics of the scene and leverages point cloud streaming, thereby enabling users to assert fine-grained control over their privacy. Further, the system adaptively controls the size of the data frame based on network capacity and scene context. The experimental results show that the network delay can be reduced by 96%. We have also tested our system in a commercial 4G network, showing that 3-D spatial sharing with point clouds over severe networks is possible. © 2022 IEEE.

Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via Text

Tandon, P.; Chandak, S.; Pataranutaporn, P.; Liu, Y.; Mapuranga, A. M.; Maes, P.; Weissman, T.; Sra, M..

IEEE Journal on Selected Areas in Communications ; : 1-1, 2022.

Article in English | Scopus | ID: covidwho-2152491

ABSTRACT

Video represents the majority of internet traffic today, driving a continual race between the generation of higher quality content, transmission of larger file sizes, and the development of network infrastructure. In addition, the recent COVID-19 pandemic fueled a surge in the use of video conferencing tools. Since videos take up considerable bandwidth (~100 Kbps to a few Mbps), improved video compression can have a substantial impact on network performance for live and pre-recorded content, providing broader access to multimedia content worldwide. We present a novel video compression pipeline, called Txt2Vid, which dramatically reduces data transmission rates by compressing webcam videos (“talking-head videos”) to a text transcript. The text is transmitted and decoded into a realistic reconstruction of the original video using recent advances in deep learning based voice cloning and lip syncing models. Our generative pipeline achieves two to three orders of magnitude reduction in the bitrate as compared to the standard audio-video codecs (encoders-decoders), while maintaining equivalent Quality-of-Experience based on a subjective evaluation by users (n = 242) in an online study. The Txt2Vid framework opens up the potential for creating novel applications such as enabling audio-video communication during poor internet connectivity, or in remote terrains with limited bandwidth. The code for this work is available at https://github.com/tpulkit/txt2vid.git. IEEE

ABSTRACT

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL